Finding evolutionarily conserved cis-regulatory modules with a universal set of motifs – Supplementary Materials

نویسندگان

  • Bartek Wilczyński
  • Norbert Dojer
  • Mateusz Patelak
  • Jerzy Tiuryn
چکیده

where P (w|M) is the probability of observing w given the motif model (drawn from the frequency matrix) and P (w|B) is the probability of observing w given the background model (estimated from the sequence). All subwords w satisfying LM (w) > tM are classified as M -occurrences. There are two standard approaches to the choice of the threshold tM [1]. The first one aims at restricting the number of false positive motif occurrences. For assumed type I error level α1, tM is chosen to satisfy P (LM (w) > tM |B) = α1. Its disadvantage is poor control on the classification of true M -occurrences. The second approach (setting tM satisfying P (LM (w) < tM |M) = α2 for assumed type II error level α2) restricts the number of false negatives. Unfortunately, it leads to the loss of control on the number of false positives, and consequently to significant disparity in the number of predicted instances of strong and weak motifs (i.e. motifs easily and hardly discriminated from the background). As it is explained in the main text, our method of CRM identification takes into account both positive and negative signals from the promoter sequence. Thus the control of both error types in the motif prediction has to be balanced, in the sense that the number of false positives should be of the order of the number of false negatives. Following the approach proposed by [1] we set the threshold tM satisfying the equation

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TRES: comparative promoter sequence analysis

Comparative promoter analysis is a promising strategy for elucidation of common regulatory modules conserved in evolutionarily related sequences or in genes showing common expression profiles. To facilitate such analysis, we have developed a software tool that detects conserved transcription factor binding sites, cis-elements, palindromes and k-tuples simultaneously in a set of sequences, and t...

متن کامل

CREME: a framework for identifying cis-regulatory modules in human-mouse conserved segments

MOTIVATION The binding of transcription factors to specific regulatory sequence elements is a primary mechanism for controlling gene transcription. Recent findings suggest a modular organization of binding sites for transcription factors that cooperate in the regulation of genes. In this work we establish a framework for finding recurrent cis-regulatory modules in the promoters of a selected se...

متن کامل

Unraveling transcriptional control in Arabidopsis using cis-regulatory elements and coexpression networks.

Analysis of gene expression data generated by high-throughput microarray transcript profiling experiments has demonstrated that genes with an overall similar expression pattern are often enriched for similar functions. This guilt-by-association principle can be applied to define modular gene programs, identify cis-regulatory elements, or predict gene functions for unknown genes based on their c...

متن کامل

Unraveling Transcriptional Control in Arabidopsis Using cis-Regulatory Elements and Coexpression Networks1[C][W]

Analysis of gene expression data generated by high-throughput microarray transcript profiling experiments has demonstrated that genes with an overall similar expression pattern are often enriched for similar functions. This guilt-by-association principle can be applied to define modular gene programs, identify cis-regulatory elements, or predict gene functions for unknown genes based on their c...

متن کامل

Prediction of similarly-acting cis-regulatory modules by subsequence profiling and comparative genomics in D. melanogaster and D. pseudoobscura

Motivation: To date, computational searches for cis-regulatory modules (CRMs) have relied on two methods. The first, phylogenetic footprinting, has been used to find CRMs in non-coding sequence, but does not directly link DNA sequence with spatio-temporal patterns of expression. The second, based on searches for combinations of transcription factor (TF) binding motifs, has been employed in geno...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008